Unsupervised Feature Selection for Histogram-Valued Symbolic Data Using Hierarchical Conceptual Clustering

نویسندگان

چکیده

This paper presents an unsupervised feature selection method for multi-dimensional histogram-valued data. We define a multi-role measure, called the compactness, based on concept size of given objects and/or clusters described using fixed number equal probability bin-rectangles. In each step clustering, we agglomerate so as to minimize compactness generated cluster. means that plays role similarity measure between be merged. Minimizing is equivalent maximizing dis-similarity cluster, i.e., concept, against whole in step. this sense, cluster quality. also show average with respect several clustering steps useful effectiveness criterion. Features having small are mutually covariate and able detect geometrically thin structure embedded obtain thorough understandings data via visualization dendrograms scatter diagrams selected informative features. illustrate proposed by artificial set real sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Histogram Clustering for Unsupervised

This paper introduces a novel statistical mixture model for probabilistic grouping of distributional (histogram) data. Adopting the Bayesian framework, we propose to perform annealed maximum a posteriori estimation to compute optimal clustering solutions. In order to accelerate the optimization process, an e cient multiscale formulation is developed. We present a prototypical application of thi...

متن کامل

Exploiting Hierarchical Structures for Unsupervised Feature Selection

Feature selection has been proven to be effective and efficient in preparing high-dimensional data for many mining and learning tasks. Features of real-world high-dimensional data such as words of documents, pixels of images and genes of microarray data, usually present inherent hierarchical structures. In a hierarchical structure, features could share certain properties. Such information has b...

متن کامل

Dissimilarity measures for histogram-valued data and divisive clustering of symbolic objects

Contemporary datasets are becoming increasingly larger and more complex, while techniques to analyse them are becoming more and more inadequate. Thus, new methods are needed to handle these new types of data. This study introduces methods to cluster histogram-valued data. However, histogram-valued data are difficult to handle computationally because observations typically have a different numbe...

متن کامل

Hierarchical and Pyramidal Clustering for Symbolic Data

This paper presents a method for clustering a set of symbolic data where individuals are described by symbolic variables of various types: interval, categorical multi-valued or modal variables, which take into account the variability or uncertainty present in the data. Hierarchical and pyramidal clustering models are considered. The constructed clusters correspond to concepts, that is, they are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Stats

سال: 2021

ISSN: ['2571-905X']

DOI: https://doi.org/10.3390/stats4020024